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Abstract 

Dependency parsers are among the most crucial tools in natural language processing as they have 
many important applications in downstream tasks such as information retrieval, machine translation 
and knowledge acquisition. We introduce the Yara Parser, a fast and accurate open-source dependency 
parser based on the arc-eager algorithm and beam search. It achieves an unlabeled accuracy of 93.32 on 
the standard WSJ test set which ranks it among the top dependency parsers. At its fastest, Yara can 
parse about 4000 sentences per second when in greedy mode (1 beam). When optimizing for accuracy 
(using 64 beams and Brown cluster features), Yara can parse 45 sentences per second. The parser 
can be trained on any syntactic dependency treebank and different options are provided in order to 
make it more flexible and tunable for specific tasks. It is released with the Apache version 2.0 license 
and can be used for both commercial and academic purposes. The parser can be found at https: 
//github.com/yahoo/YaraParser 
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1 Introduction 


Dependency trees are one of the main representations used in the syntactic analysis of sentences. They 
show explicit syntactic dependencies among words in the sentence [Kilbler et ah, 2009] . Many depen¬ 
dency parsers have been released in the past decade. Among them, graph-based and transition-based 
parsing are two main approaches towards dependency parsing. In graph-based models, the parser aims 
to find the most likely tree from all possible trees by using maximum spanning tree algorithms often 
in conjunction with dynamic programming. On the other hand, in transition-based models, a tree is 
converted to a set of incremental actions and the parser decides to commit an action depending on the 
current configuration of the partial tree. Graph-based parsers can achieve state-of-the-art performance 
with the guarantee of recovering the best possible parse, but usually at the expense of speed. On the 
other hand, transition-based parsers are fast because the parser can greedily choose an action in each 
configuration and thus it can use arbitrary non-local features to compensate the lack of optimality. Also, 
it is easy to augment the set of actions to extend the functionality of the parser on such tasks as disfiu- 
ency detection [Rasooli and Tetreault, 2013[ [Rasooli and Tetreault, 2014) [Honnibal and Johnson, 2014] 


and punctuation prediction [Zhang et ah, 2013a . They are mostly used in supervised tasks but in rare 


cases they are also used in unsupervised tasks either with little manual linguistic knowledge or with no 
prior knowledge [Daume III, 200^|Rasooli and Faili, 2012] . 

In this report, we provide a brief introduction to our newly released dependency parser. We show 
that it can achieve a very high accuracy on the standard English WSJ test set and show that it is very 
fast even in its slowest mode while getting results very close to state-of-the-art. The structure of this 
report is as follows: in ^we provide some details about using Yara both in command line and as an API. 
We provide technical details about it in ^and experiments are conducted in ^ Finally we conclude in 


2 Using Yara in Practice 

In this section, we give a brief overview of training the parser, using it from the command-line and also 
as an API. Finally we introduce a simple NLP pipeline that can parse text files. All technical details for 
the parser are provided in ^ The default settings for Yara are expected to be the best in practice for 
accuracy (except the number of training iterations which is dependent on the data and feature settings). 

2.1 Data format 

Yara uses the CoNLL 2006 dependency formal]^ for training as well as testing. The CoNLL format is 
a tabular one in which each word (and its information) in a sentence occupies one line and sentences 
are separated by a blank line. Each line is organized into the following tab-delimited columns: 1) word 
number (starting at one), 2) word form, 3) word lemma, 4) coarse-grained POS tag, 5) fine-grained 
POS (part-of-speech) tag, 6) unordered set of syntactic and/or morphological features, separated by 
a vertical bar (|), or an underscore if not available, 7) head of current token (an integer showing the 
head number where 0 indicates root token), 8) dependency label, 9) projective head (underscore if not 
available) and 10) projective dependency labels (underscore if not available). Blank fields are represented 
by an underscore. Yara only uses the first, second, fourth, seventh and eights columns. 

2.2 Training and Model Selection 

The jar file in the package can be directly used to train a model with the following command line (run 
from the root directory of the project): 

» java -jar jar/YaraParser.jar train -train-file [train-file] -dev [dev-file] -model [model-file] 
^ http://ilk.uvt.nl/conll/#dataformat 
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-punc [punc-file] 


where [train-file] and [dev-file] are CoNLL files for training and development data and [model-file] 
is the output path for the trained model file, [punc-file] contains a list of POS tags for punc¬ 
tuations in the treebank (see (2.2.11. The model for each iteration will be saved with the pattern 
[model-file]^iter[iter#] ; e.g. model_iter2. In this way, the user can track the best performing 
model and delete all others. For cases where there is no development data, the user can remove the -dev 
option from the command line and use any of the saved model files as the final model based on his/her 
prior knowledge (15 is a reasonable number). 

The other options are as follows: 


• -cluster [cluster-file] Brown cluster file: at most 4096 clusters are supported by Yara (default: 
empty). The format should be the same as 
blob/master/output.txt 

• beam:[beam-width]; e.g. beam: 16 (default 

• iter:[training-iterations]; e.g. iter:10 (default is 20). 

• unlabeled (default: labeled parsing, unless explicitly put ‘unlabeled’) 

• lowercase (default: case-sensitive words, unless explicitly put ‘lowercase’) 

• basic (default: use extended feature set, unless explicitly put ‘basic’) 

• static (default: use dynamic oracles, unless explicitly put ‘static’ for static oracles) 

• early (default: use max violation update, unless explicitly put ‘early’ for early update) 

• random (default: choose maximum scoring oracle, unless explicitly put ‘random’ for randomly 
choosing an oracle) 

• nt: 7 /:threads; e.g. nt:4 (default is 8). 

• root_first (default: put ROOT in the last position, unless explicitly put ‘root_first’) 


https://github.com/percyliang/brown-cluster/ 
is 64). 


2.2.1 Punctuation Files 

In most dependency evaluations, punctuation symbols and their incoming arcs are ignored. Most parser 
do this by using hard-coded rules for punctuation attachment. Yara instead allows the user to specify 
which punctuation POS tags are important to their task by providing a path for a punctuation file 
([punc-file]) with the -punc option (e.g. -punc punc_f iles/wsj . puncs). If no file is provided, 

Yara uses WSJ punctuations. The punctuation file contains a list of punctuation POS tags, one per 
line. The Yara git repository provides punctuation files for WSJ data and Google universal POS tags 
[Petrov et ah, 2011[ . 

2.2.2 Some Examples 

Here we provide examples for training Yara with different settings. Essentially we pick those examples 
where, we think, would be useful in practice. 

Training with Brown clusters This can be done via the -cluster option. 

» java -jar jar/YaraParser.jar train -train-file [train-file] -dev [dev-file] -model [model-file] 
-punc [punc-file] -cluster [cluster-file] 


Training with the fastest mode This can be done via the basic and beam:l options. 

» java -jar jar/YaraParser.jar train -train-file [train-file] -dev [dev-file] -model [model-file] 
-punc [punc-file] beam:l basic 
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Changing the number of iterations This can be done via the iter option. In the following 
example, we selected 10 iterations. 

» java -jar jar/YaraParser.jar train -train-file [train-file] -dev [dev-file] -model [model-file] 
-punc [punc-file] iter:10 


Extending memory consumption It is possible that Java default setting for memory is less than 
what is really needed in some particular data sets. In those cases, we can extend the memory size by the 
JVM -Xmx option. In the following example, memory is extended to ten gigabytes. 

» java -XmxlOg -jar jar/YaraParser.jar train -train-file [train-file] -dev [dev-file] -model 
[model-file] -punc [punc-file] 


Using very specific options The following example shows a specific case where Yara trains a 

model on the training data (data/train.conll), develops it on the development data (data/dev. conll), 

saves each model in the model file (model/train .model) for each iteration (model/train.model_iterl, 

model/t rain. model„iter2, model/t rain. model_ite r3, etc), uses its specific punctuation list (punc_f iles/my_lang . puncs), 

uses its specific Brown cluster data (data/cluster, path), trains the model in 10 iterations, with 16 

beams and 4 threads and uses static oracle and early update. This is all done after all words are 

lowercased (with the lowercase option). 

» java -XmxlOg -jar jar/YaraParser.jar train -train-file data/train.conll -dev data/dev.conll 
-model model/train.model -punc punc^files/my_lang.puncs -cluster data/cluster.path beam:16 
iter:10 unlabeled lowercase static early nt:4 root^first 


2.3 Test and Evaluation 

The test file can be either a CoNLL file or a POS tagged file. The output will be a file in CoNLL format. 

Parsing a CoNLL file 

» java -jar jar/YaraParser.jar parse_conll -input [test-file] -out [output-file] -model [model-file] 


Parsing a tagged file The tagged file is a simple file where words and tags are separated by a 
delimiter (default is underscore). The user can use the option -delim [delimiter] (e.g. -delim /) to 
change the delimiter. The output will be in CoNLL format. 

» java -jar jar/YaraParser.jar parse_tagged -input [test-file] -out [output-file] -model 
[model-file] 


Evaluation Both [gold-file] and [parsed-file] should be in CoNLL format. 

» java -jar YaraParser.jar eval -gold [gold-file] -parse [parsed-file] -punc [punc-file] 

A more descriptive end-to-end example by using a small amount of German training dataj^is shown in 
Yara’s Github repository. This example is shown at https : //github. com/yahoo/YaraParser#example- usage 

2.4 Parsing a Partial Tree 

Yara can parse partial trees where some gold dependencies are provided and it is expected to return a 
dependency tree consistent with the partial dependencies. Unknown dependencies are represented with 
“-1” as the head in the GoNLL format. Figureshows an example of partial parse tree before and after 
doing constrained parsing. 

^ https://github.com/yahoo/YaraParser/tree/master/sample„data 
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I want to parse a sentence . ROOT I want to parse a sentence . ROOT 


Figure 1: A sample partial dependency tree on the left side and its filled tree on the right. As shown in this 
figure, the added arcs are completely consistent with the partial tree arcs. 


» java -jar YaraParser.jar parse_partial -input [test-file] -out [output-file] -model [model-file] 


2.5 Yara Pipeline 

We also provide an easy pipeline to use Yara in real applications. The pipeline benefits from the 
OpenNLlj^ tokenizer and sentence delimiter and our own POS taggei]^ Thus the user has to down¬ 
load a specific sentence boundary detection and word tokenizer model from OpenNLP website depending 
on the specific target language. It is also possible to train a new sentence boundary detection and word 
tokenizer model with OpenNLlj^ 

The number of threads can be changed via the option nt: [#nt] (e.g. nt: 10). The pipeline can be 
downloaded from https://github.com/rasoolims/YaraPipeline 

» java -jar jar/YaraPipeline.jar -input [input file] -output [output file] -parse_model [parse 
model file] -pos_model [pos model] -tokenizer_model [tokenizer model] -sentence_model [sentence 
detector model] 


2.6 Pipeline API usage 

It is possible to use the Yara API directl}|^ but the pipeline gives an easier way to do it with different 
levels of information. The user can set the number of threads for parsing: numberOfThreads. 

2.6.1 Importing libraries 

The user should first import libraries into the code as in Listing]^ Class YaraPipeline. java contains 
static methods for parsing a sentence, and ParseResult contains information about words, POS tags, 
dependency labels and heads, and normalized tagging score and parsing score. Info contains all infor¬ 
mation about parsing setting and models for the parser, POS tagger, tokenizer and sentence boundary 
detector. 

1 import edu . Columbia . cs . rasoo 1 i . YaraPipeline . Structs . Info ; 

2 import edu . Columbia . cs . rasooli . YaraPipeline . Structs . ParseResult ; 

3 import edu . Columbia . cs . r as oo 1 i . YaraP ipeline . YaraP ipeline ; 

Listing 1: Code for importing necessary libraries. 


2.6.2 Parsing Raw Text File 

In this case, we need to have all models for parsing, tagging, tokenization and sentence boundary 
detection. Listing shows such a case where the parser puts the results in CoNLL format into the 

[output_file]. 

“http://opennlp.apache.org/index.html 

"I nttps: //gitnuD. com/ rasoo Lims/bemibupervisedPosTagger 

®Por more information please visit UpenNLP manuai at https ://opennlp . apache . 0rg/documentatlon/l. 5.3/ 
manual/opennlp.html 

^https://github.com/yahoo/YaraParser/blob/master/src/YaraParser/Parser/API_UsageExample.java 
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1 // should put real file path in the brackets (e.g. [parse model]) 

2 Info info l=new Info ( '' [ parse model ] " , " [ pos model ] " , " [ tokenizer model ] " , " [ 

sentence_model ] " , numberOfThreads) ; 

3 YaraPipeline.parseFile(" [input _ file] ” [output_file] " ,infol) ; 

Listing 2: Code for parsing raw text file 


2.6.3 Parsing Raw Text 

Similar to parsing a file, we can parse raw texts. It is shown in Listing 

1 // should put real file path in the brackets (e.g. [parse_model]) 

2 Info i nf o2=new Info ( '' [ parse_model ] " , " [ pos_model ] " [tokenizer_model] " [ 

sentence_model ] " , numberOfThreads) ; 

3 String someText=" some text .... ” ; 

4 String conllOutputText2= YaraPipeline . parseText (someText , info 1 ) ; 

Listing 3: Code for parsing raw text 


2.6.4 Parsing a Sentence 

For the cases where the user uses his own sentence delimiter, it is possible to parse sentences as shown 
in Listing!^ 

1 // should put real file path in the brackets (e.g. [parse_model]) 

2 Info info3=new Info (''[ parse_model ]","[ pos_model ]",''[ tokenizer_model]", 

numberOfThreads) ; 

3 String someSentence=" some sentence."; 

4 ParseResult parseResult3= YaraPipeline . parseSentence ( someSentence , infol); 

5 String conllOutputText3=parseResult3 . getConllOutput () ; 

Listing 4: Code for parsing a sentence 


2.6.5 Parsing a Tokenized Sentence 

Listing]^ shows an example for the cases where the user only wants to use the parser and POS tagger to 
parse a pre-tokenized sentence. 

1 // should put real file path in the brackets (e.g. [parse_model]) 

2 Info info4=new Info ("[parse _ model ]","[ pos _ model]", numberOfThreads); 

3 String [] someWords4={ " some " , " words 

4 ParseResult parseResult4= YaraPipeline . parseTokenizedSentence (someWords4 , infol); 

5 String conllOutputText4=parseResult4 . getConllOutput 0 ; 

Listing 5: Code for parsing a tokenized sentence 


2.6.6 Parsing a Tagged Sentence 

Listing shows an example for the cases where the user only wants to use Yara to parse pre-tagged 
sentence. 

1 // should put real file path in the brackets (e.g. [parse_model]) 

2 Info info5=new Info ("[parse_model]", numberOfThreads); 

3 String [] someWords5={ " some " , " words 

4 String [ ] someTags5={ " tagl " , " tag2 " , " tag3 " } ; 

5 ParseResult parseResult5= YaraPipeline . parseTaggedSentence (someWords5 , someTagsS , 

infol); 

6 String conllOutputText5=parseResult5 . getConllOutput 0 ; 


Listing 6: Code for parsing a tagged sentence 
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Act. 

Stack 

Buffer 

Arc(h,d) 

Shift 

[1 

[Ii, want 2 , t 03 , parse 4 , as, sentences, . 7 , ROOTs] 


Left-Arc (nsub j) 

[Ill 

[want 2 , tos, parse 4 , as, sentences, . 7 , ROOTg] 

nsubj( 2 ,l) 

Shift 

[j 

[want 2 , t 03 , parse 4 , as, sentences, . 7 , ROOTg] 


Shift 

[want 2 ] 

[to 3 , parse 4 , as, sentences, . 7 , ROOTs] 


Left-arc (aux) 

[want 2 , tos] 

[parse 4 , as, sentences, . 7 , ROOTg] 

aux(4,3) 

Right-arc (xcomp) 

[want 2 ] 

[parse 4 , as, sentences, . 7 , ROOTs] 

xcomp(2,4) 

Shift 

[want 2 , parse 4 | 

[as, sentences, . 7 , ROOTs] 


Left-arc(det) 

[want 2 , parse4, asl 

[sentences, . 7 , ROOTg] 

det(6,5) 

Right-arc (dob j) 

[want 2 , parse 4 | 

[sentences, . 7 , ROOTs] 

dobj(4,6) 

Reduce 

[want 2 , parse4, sentenceel 

[. 7 , ROOTs] 


Reduce 

[want 2 , parse 4 | 

[. 7 , ROOTs] 


Right-arc (punct) 

[want 21 

[. 7 , ROOTs] 

punct(2,7) 

Reduce 

[want 2 , . 7 ] 

[ROOTs] 


Left-arc(root) 

[want 2 [ 

[ROOTs] 

root(8,2) 

DONE! 


[ROOTs] 



Figure 2: A sample action sequence with arc-eager actions for the dependency tree in Figure 


3 Yara Technical Details 

Yara is a transition-based dependency parser based on the arc-eager algorithm [Nivre, 2004| . It uses 
beam search training and decoding [Zhang and Clark, 2008| in order to avoid local errors in parser 
decisions. The features of the parser are roughly the same as [Zhang and Niv re, 2011| with additional 
Brown clustering [Brown et ah, 199^ features^ Yara also includes several flexible parameters and options 
to allow users to easily tune it depending on the language and task. Generally speaking, there are 128 
possible combinations of the settings in addition to tuning the number of iterations, Brown clustering 
features and beam width|3 


3.1 Arc-Eager Algorithm 

As in the arc-eager algorithm, Yara has the following actions: 

• Left-arc (LA): The first word in the buffer becomes the head of the top word in the stack. The 
top word is popped after this action. 

• Right-arc (RA): The top word in the stack becomes the head of the first word in the buffer. 

• Reduce (R): The top word in the stack is popped. 

. Shift (SH): The first word in the buffer is pushed to the stack. 

Depending on position of the root, the constraints for initialization and actions differ. Figure shows 
the transitions used to parse the sentence T want to parse a sentence .". 

Unshift Action The original algorithm is not guaranteed to output a tree and thus in some occasions 
when the root is positioned in the beginning of the sentence, the parser decides to connect all remaining 
words in the stack to the root token. In [Nivre and Fernandez-Gonzalez, 2014| , a new action and empty 
flag is introduced to compensate for this problem and preserve the tree constraint. The action is called 
unshift which pops the first word in the stack and returns it to the start position of the buffer. We also 
added the “unshift” action for the cases where the root token is in the initial position of the sentence. 
This makes the parser more robust and gives a slight boost in performance^ 


3.2 Online Learning 


Most current supervised parsers use online learning algorithms. Online learners are fast, efficient and 
very accurate. We use averaged stru ctured perceptron [G ollins, 2 002| which is also use d in previous sim¬ 
ilar parsers [Zhang and Glark, 2008[ [Zhang and Nivre, 201 1[ Ghoi and Palmer, 201 1[ . We use different 


^The idea of using Brown clustering features is inspired from |Koo et al., 200^ |Honnibal and Johnson, 2014| . 
^We put the best performing setting as the default setting for Yara. 

^This problem happens less in the case of beam search and it is more often in greedy parsing. 
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engineering methods to speed up the parser, such as the averaging trick introduced by [Daume III, 2006] 
Figure 2.3]. Furthermore, all the features except label set-lexical pair features are converted to long 
integer values to prevent frequent hash collisions and decrease memory consumption. Semi-sparse weight 
vectors are used for additional speed up, though it comes with an increase in memory consumption. The 
details of this implementation are out of the scope of this report. 


3.3 Beam Search and Update Methods 


Early transition-based parsers such as the Malt parser [Nivre et ah, 20d6| were greedy and trained in 
batch mode. This was done by converting each tree to a set of independent actions. This has been 
shown to be less effective than a global search. Given our feature setting, it is impossible to use dynamic 
programming to get the exact result. We instead use beam search as an approximation!^ Therefore, 
unlike batch learning, the same procedure is used for training and decoding the parser. Yara supports 
beam search and its default beam size is 64. 

There are several ways to update the classifier weights with beam learning. A very trivial way is to 
get the best scoring result from beam search as the prediction and update the weights compared to the 
gold. This is known as “late update” but it does not lead to a good performance [Fluang et ah, 2012| . 
A more appealing way is to keep searching until the gold prediction goes out of the beam or the search 
reaches the end state. This is known as "early update" [Collins and Roark, 2004] and studies have shown 
a boost in performance relative to late update [Collins and Roark, 2004 Zhang and Clark, 2008[ . The 
main problem with early update is that it does not update the weights according to the maximally violated 
prediction. A "max-violation" is a state in the beam where the gold standard is out of the beam and 
the gap in the score of the gold prediction and best scoring beam item is maximum. With max-violation 
update [Fluang et ah, 2012] , the learner updates the weights according to the max-violation state. In 
other words, max-violation is the worst mistake that the classifier makes in the beam compared to the 
gold action. Yara supports both early and max-violation update while Zpar [Zhang and Nivre, 2011] 
only supports early update and RedShift [Flonnibal and Johnson, 2014] only supports max-violation. Its 
default value for the update model is max-violation. 


3.4 Dynamic and Static Oracles 

With the standard transition-based parsing algorithms, it is possible to have a parse tree with different 
action sequences. In other words, different search paths may lead to the same parse tree. Most of the 
off-the-shelf parsers such as Zpar [Zhang and Nivre, 2011] define some manual rules for recovering a gold 
oracle to give it to the learner. This is known as a static oracle. The other way is to allow the oracle to 
be dynamic and let the learner choose from the oracles [Goldberg and Nivre, 201^ . Yara supports both 
static and dynamic oracles. In the case of dynamic oracles, only zero-cost explorations are allowed. In 
[Goldberg and Nivre, 201^ , the gold oracle can be chosen randomly but we also provided another option 
to choose the best scoring oracle as the selected oracle. The latter way is known as latent structured 
Perceptron [Sun et ah, 20I3| by supposing the gold tree as the structure and each oracle as a latent path 
for reaching the final structure. Our experiments show that using the highest scoring oracle gives slightly 
better results and thus we let it be the default option in the parser training. 

3.5 Other Properties 

Root Position In [Ballesteros and Nivre, 2013] , it is shown that the position of the root token has a 
significant effect on the parser performance. We allow the root to be either in the initial or final position 
in the sentence. The final position is the default option for Yara parser. 

Features We use roughly the same feature set as [Zhang and Nivre, 201 1| . The extended feature set 
is the default but the user can use the basic option to set it to basic set of local features to improve 
speed with a loss in accuracy. We also add extra features from Brown word clusters [Brown et al., 1992], 
as used in [Koo et al., 2008] , by using the Brown clusters for the first word in the buffer and stack, the 
prefixes of length 4 and 6 from the cluster bit string in the place of part of speech tags and the full bit 
string of the cluster in the place of words. When using all the features, we get a boost in performance 
but at the expense of speed. 

^®Greedy search can be viewed as beam search with a beam size of one. 





































Unlabeled Parsing Although the parser is designed for labeled parsing, unlabeled parsing is also 
available through command line options. This is useful for the cases where the user simply needs a very 
fast parser and does not care about the loss in performance or the lack of label information. 

Partial Parsing There are some occasions especially in semi-supervised parsing, where we have 
partial information about the tree, for example, we know that the third word is the subject of the first 
word. With partial parsing, we let the user benefit from dynamic oracles to parse partial trees such that 
known arcs are preserved unless the tree constraints cannot be satisfied. 

Multithreading Given the fact that current systems have multiple processing unit cores and many 
of those cores, support hyper-threading, we added the support for multithreading. When dealing with 
a file, the parser does multithreaded parsing on the sentence level (i.e. parsing sentences in parallel 
but outputting them in the same order given in the input). When using the API, it is possible to use 
multithreading at the beam-level. Beam level multithreading is slower than sentence-level multithreading. 
We also use beam-level multi-threading for training the parser and this significantly speeds up the training 
phase. Yara’s default is set to 8 threads but the user can easily change it. 

Model Selection Unlike most current parsers, Yara saves the model file for all training iterations 
and lets the user choose the best performing model based on the performance on the development data. 
It also reports the performance on the development data to make it easier for the users to select the best 
model. 

Tree Scoring Yara also has the option to output the parse score to a text file. The score is the 
perceptron score divided by the sentence length. 

Lowercasing In cases, such as spoken lanuage parsing, no casing is provided and it is better to train 
on lowercased text. Yara has this option with the argument lowercase in training. 


4 Experiments 

In this section we show how Yara performs on two different data sets and compare its performance to 
other leading parsers. We also graphically depict the tradeoff between beam width and accuracy and 
number of iterations. For all experiments we use version 0.2 of Yara. We use a multi-core 2.00GHz 
Intell Xeon machine. The machine has twenty cores but we only use 8 threads (parser’s default) in all 
experiments. 

4.1 Parsing WSJ Data 

We use the the traditional WSJ train-dev-test split for our experiment. As in [Zhang and Nivre, 2011| , 
we first converted the WSJ data [Marcus et ah, 1993| with Penn2MallpJ Next, automatic POS tags 
are generated for the whole dataset with version 0.2 of our POS taggeip^ by doing 10-way jack-knifing 
on the training data. The tagger is a 20-beam third-order tagger trained with the maximum violation 
strategy with the same settings as in [Gollins, 2002[ , along with additional Brown clustering features 
[Liang, 2005 [f^ It achieved a POS tagging accuracy of 97.14, 97.18 and 97.37 on the train, development 
and test files respectively. 

Table shows the results on WSJ data by varying beam size and the use of Brown clusters. A 
comparison with prior art is made in Table All unlabeled accuracy scores (UAS) and labeled accuracy 
scores (LAS) are calculated with punctuations ignored. As seen in Table [T| Yara’s accuracy is very close 
to the state-of-the-art [Bohnet and Nivre, 2012| . 

stp. lingfil. uu. se/~nivre/research/Penn2Malt. html 
nttps://gitnup.com/rasoo Lims/bemibuperviseOKosiagger 

use the pre-built Brown cluster features in Pttp ://metaoptlmize . COm/pro j ectS/wOrd reprs/ with 1000 word 

classes. 
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Parser 

beam 

Features 

Iter# 

Dev UAS 

Test UAS 

Test LAS 

Sent/Sec 

Yara 

1 

ZN (basic-bunlabeled) 

5 

89.29 

88.73 

- 

3929 

Yara 

1 

ZN (basic) 

6 

89.54 

89.34 

88.02 

3921 

Yara 

1 

ZN -b BC 

13 

89.98 

89.74 

88.52 

1300 

Yara 

64 

ZN 

13 

93.31 

92.97 

91.93 

133 

Yara 

64 

ZN -b BC 

13 

93.42 

93.32 

92.32 

45 


Table 1: Parsing accuracies of Yara parser on WSJ data. BC stands for Brown cluster features, UAS for 
unlabeled attachment score, LAS for labeled attachment score and ZN for [Zhang and Nivre, 2011| . Sent/sec 
refers to the speed in sentences per second. 



Parser 

UAS 

LAS 



McDonald et ah, 2005 

90.9 

- 



McDonald and Pereira, 2006| 

91.5 

- 

T3 


Sagae and Lavie, 2006| 

92.7 

- 

0) 

CO 


Koo and Collins, 2010] 

93.04 

- 

-D 


Zhang and McDonald, 2012| 

93.06 

- 

a 

o 


Martins et al., 2013| 

93.07 

- 


Qian and Liu, 2013j 

93.17 

- 


Ma and 

93.4 

- 



Zhang et^i^^W^ 

93.50 

92.41 



Zhang and McDonald, 2014' 

93.82 

92.74 



Nivre et al., 200^ 

88.1 

86.3 

CO 


Zhang and Clark, 2008 

92.1 

- 



Huang and Sagae, 2010| 

92.1 

- 

.2 


Zhang and Nivre, 2011| 

92.9 

91.8 

‘co 


Bohnet and Nivre, 2012 

93.38 

92.44 

a 


Choi and McCallum, 2013 

92.96 

91.93 

Yara 

93.32 

92.32 


Table 2: Parsing accuracies on WSJ data. We only report results which use the standard train-dev-test 
splits and do not make use of additional training data (as in self-training). The first block of rows are the 
graph-based parsers and the second block are the transition-based parsers (including Yara). 


Effect of Beam Size Choosing a reasonable beam size is essential in certain NLP applications as 
there is always a trade-off between speed and performance. As shown in Figure after a beam size of 
eight, the performance results do not change as much as the performance gap in for example beam of 
size one compared to beam of size two. This is useful because when changing the beam size from 64 
to 8, one may speed up parsing by a factor of three (as shown in Table with a small relative loss in 
performance. 


Beam Size 

1 (ub) 

1 (b) 

1 

2 

4 

8 

16 

32 

64 

Dev UAS 

89.29 

89.54 

89.98 

91.95 

92.80 

93.03 

93.27 

93.22 

93.42 

Speed (sen/sec) 

3929 

3921 

1300 

370 

280 

167 

no 

105 

45 


Table 3: Speed vs. performance trade-off when using Brown clustering features and parsing CoNLL files 
with eight threads (except 1 (ub) and and 1(b) which are unlabeled and labeled parsing with basic features). 
The numbers are averaged over 20 training iterations and parsing development set after each iteration. 


4.2 Parsing Non-Projective Languages: Persian 

As mentioned before, Yara can only be trained on projective trees and thus there will be some loss in 
accuracy for non-projective languages. We use version 1.1 of the Persian dependency treebank (PerDT) 
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Figure 3: The influence of beam size on each training iterations for Yara parser. Yara is trained with Brown 
clusters in all of the experiments in this figure. 


Model 

Unlabeled accuracy Labeled accuracy 

Mate v3.6.1 

91.32 87.68 

Yara (without Brown clusters) 
Yara (with Brown clusters) 

89.52 85.77 

89.97 86.32 


Table 4: Parsing results on the Persian treebank excluding punctuations 


[Rasooli et al., 201 3p^ and tagged it with the same setting as WSJ data. We tokenized Mizan corpuj^ 
and add it to our training data to create 1000 Brown clusters]^ The training data contains 22% non- 
projective trees. We use Mate parser (v3.6.1) [Bohnet, 2010| as a highly accurate non-projective parsing 
tool to compare with Yara. Table shows the performance for the two parsers. There is a 1.35% gap in 
unlabeled accuracy but given that 22% of the trees (~2.5% of the arcs) are non-projective, this gap is 
reasonable. 


5 Conclusion and Future Work 

We presented an introduction to our open-source dependency parser. We showed that the parser is very 
fast and accurate. This parser can also be used for non-projective languages with a very slight loss in 
performance. We believe that our parser can be useful in different downstream tasks given its performance 
and flexible license. Our future plans include extending this parser to handle non-projectivity and also 
use continuous value representation features such as word embeddings to improve the accuracy of the 
parser. 

http: //WWW. dadegan. ir/catalog/perdt 
http: //WWW. dadegan. ir/cata Log/mizan 

'^®'i'he definition of Brown cluster in this data is loose because there are multi-word verbs in the treebank while Brown 
clusters are acquired from training on single words. Therefore multi-word verbs in the treebank will not get any Brown cluster 
assignment and thus we will have a slight loss in performance. 
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